The distribution of initial estimates moderates the effect of social influence on the wisdom of the crowd

Whether, and under what conditions, groups exhibit “crowd wisdom” has been a major focus of research across the social and computational sciences. Much of this work has focused on the role of social influence in promoting the wisdom of the crowd versus leading the crowd astray and has resulted in conflicting conclusions about how social network structure determines the impact of social influence. Here, we demonstrate that it is not enough to consider the network structure in isolation. Using theoretical analysis, numerical simulation, and reanalysis of four experimental datasets (totaling 2885 human subjects), we find that the wisdom of crowds critically depends on the interaction between (i) the centralization of the social influence network and (ii) the distribution of the initial individual estimates. By adopting a framework that integrates both the structure of the social influence and the distribution of the initial estimates, we bring previously conflicting results under one theoretical framework and clarify the effects of social influence on the wisdom of crowds.


S1 Network models of collective estimation
Let ✓ be an unknown state of the world. Consider n agents indexed by i = 1, . . . , n, each endowed with a biased and noisy signal about ✓. The signals are independent and identically distributed across the agents and constitute their initial estimates of the unknown ✓: The the distribution of the initial estimates in (S.1), F ✓ µ, , is parametrized by ✓, µ, and . We think of µ as a location parameter (the center of the distribution) that biases the individual estimates against ✓. This captures the level of systematic bias in the population. We think of as a variance-proxy/shape parameter that determines the variation and tail-fatness of F ✓ µ, . In other words, can be interpreted as the amount of prior information a group has about the quantity and represents the level of demonstrability of the estimation task.
The agents interact in a group. Their group interactions can be modeled in a variety of ways leading to a group aggregate a n (w) that is a convex combination of the initial estimates: a n (w) :=w Tā wherew is an entry-wise non-negative vector satisfyingw T 1 = 1.
In general, different agents' initial estimates will receive different weights in the collective estimate. A common method of modeling group interactions is through DeGroot-style iterated averaging, which has a long history in mathematical sociology and social psychology [1]. The origins of iterated averaging models can be traced to French's seminal work on "A Formal Theory of Social Power" [2], followed up by Harary's investigation of the mathematical properties of the averaging model, including the consensus criteria, and its relations to Markov chain theory [3]. This model was further popularized by DeGroot's seminal work [4] on linear opinion pools and belief exchange dynamics. In a typical iterated averaging setup, an agent's estimate collective estimate is denoted by a n ; it is determined in terms of the individual estimates, in a manner that involves a centralization parameter !: a n (!) = !a 1,0 + (1 !) 1 n P n i=1 a i,0 . Vectors are denoted by a bar on top of their letters and we use superscript T to denote matrix transpose.

S2
at time t is given by a weighted average of her neighboring estimates at time t 1: W ij a j,t 1 , In matrix notation, we have:ā whereā t = (a 1,t , a 2,t , . . . , a n,t ) T and W = [W ij ] is the matrix of weights. We refer to matrix which implies a consensus on the collective estimate (S.2). In several experimental settings [6,7,8,9], human participants get to revise their numerical estimates a few times only, and the collective estimate is then calculated by averaging the revised estimates. Let us denote the number of communication rounds in such a scenario by ⌧ . Using (S.3) to model the revision of the numerical estimates, we again arrive at a model that gives the collective estimate as a convex combination of the initial estimates, a n (w ⌧ ) =w T ⌧ā 0 , where the transposed vector of weights,w T ⌧ , is given byw

S1.1 !: Parameterizing a class of networks by their centralization
Motivated by our interest in comparing the collective estimation performance of centralized and decentralized networks, we focus our attention on a class of social influence network structures for which the collective estimate can be written as follows: where ! is a measure of influence centralization, with ! = 1 representing a fully centralized social influence structure (w 1 = 1, and w 2 = . . . = w n = 0) and ! = 0 corresponding to a fully decentralized social influence structure (w 1 = . . . = w n = 1/n). Indeed, withw as the centrality vector, the parameter ! corresponds to the Freeman centralization of the underlying network with social influence matrix W . By varying ! 2 [0, 1], we can interpolate between the two extremes: full centralization, ! = 1, and complete decentralization, ! = 0. This class, although not encompassing, is ideal for addressing the central question of interest in this work.
The networks in this class are such that one agent, i = 1, is distinguished with a higher influence and all others, i > 1, have an equal but lower influence. Networks in this class include cases of practical and empirical interest, such as star networks and circular lattices. All networks in Figure 1.B, in the main text, belong to this class.
Subsequently, the network centralization parameter ! in (S.5) for the star and cycle networks are given by ! ? = (n 2)/(3n 2) and ! o = 0. Note that as n ! 1, ! ? ! 1/3. This, together with the fact that experimental studies have used the star topology to test collective estimation in centralized structures [7], motivates our choice of ! = 1/3 in numerical simulations and empirical analysis. In section S4, we show that our results are robust to this choice of !.
Although our main proposition in section S2.1 gives a lower bound that is valid for any n, most of the subsequent analysis concerns the limiting behavior of the collective estimates as n ! 1. Our results extend to networks with a finite (non-increasing) collection of influ-S4 ential agents. To accommodate such cases, one would replace a 1,0 in (S.5) with the (possibly weighted) average of the initial estimates of the k influential agents, for some fixed constant k. One can similarly consider generalizations where the remaining, non-influential agents have unequal -but all vanishing -weights (going to zero as n ! 1). S1.2 ⌦: Will the estimation context benefit from centralization?
We consider a case where agents are randomly placed in the social influence network. This is typical of many experimental setups [8,7,6]. Let E ✓ µ, be the expectation with respect to the random draws of the n i.i.d. initial estimates, a 1,0 , . . . , a n,0 ⇠ F ✓ µ, , and let P ✓ µ, be the corresponding probability measure. The expected mean square root error, mean absolute error, and mean squared error of the collective estimate are given by: In order to investigate the interaction between the network structure and the distribution of the initial estimates, (i.e. the estimation context: a population of agents performing a particular estimation task), we propose the following measure of how the collective estimate a n (w) = w Tā 0 performs against a fully decentralized aggregate a n (0): where w is the centrality vector in (S.4). Restricting attention to the class of networks in subsection S1.1, with a n (!) = !a 1,0 + (1 !) 1 n P n i=1 a i,0 , we can write: This measure, ⌦ n (!, F ✓ µ, ), corresponds to the probability that a network with social influence centralization ! > 0 outperforms a decentralized network with ! = 0, in absolute error perfor-mance. Similarly, for other performance measures, we can write: Our focus in section S2 is on ⌦ n , as the probability of the outcome of interest, to understand if and when the estimation context will benefit from centralization. In section S2, we present a theoretical and numerical analysis of the properties of ⌦ n . In particular, we show how the behavior of ⌦ n varies with the estimation context, i.e. the distribution of the initial estimates.
In section S2.1, we present a theoretical lower bound on ⌦ n (!, F ✓ µ, ) and analyze its behavior for various classes of distributions, F ✓ µ, . In section S2.2, we supplement these findings by numerical analysis and simulations. We present our numerical results in the main text for ! = 1/3 and n = 50. In section S4 we show that our results are robust to our choices of ! and n. Of note, ⌦ n is concerned only with the probability of the following event: the collective estimate generated by the agents interacting in a centralized influence structure will be closer to the truth than the collective estimate generated by the agents in a decentralized structure. This is not the same as comparing the expected loss or error magnitudes. In section S4, Figure S4, we show the results for various loss function choices.
In section S3, we propose a feature of the estimation context that can be empirically measured to predict when social influence improves the collective estimation accuracy in prior empirical studies. The proposed feature, R, is based on how heavy-tailed the distribution of the initial estimates is. It is defined as the relative fit (measured in log likelihood) of a log-normal versus a normal distribution to the empirical distribution of the initial estimates. In particular, in experimental conditions with no social interactions, an external observers polls each of the participants for their opinions. Therefore, in the absence of social influence, the aggregate is given by a n (0) = (1/n) P n i=1 a i,0 , which is equivalent to a fully decentralized influence structure. On the other hand, in the presence of social influence, the participants revise their estimates as a result of their social interactions, thus leading to an aggregate that is a weighted average of the initial estimate, a n (w) = P n i=1 w i a i,0 . Hence, social interaction leads to a collective estimate that is less decentralized. In our model, we capture this case by a n (!) for some ! > 0. Our empirical results show that R can significantly predict when the latter is more accurate than a n (0).

S2.1 The main proposition
Motivated by empirical literature that pose estimation questions to human participants, we focus on 0 < ✓ and distributions F ✓ µ, with support over positive reals. Fix 0 < ✓, 0 < ! < 1, ✓/(1 !) < , and consider the event E 1 = {a 1,0 < }. Note that for many distributions we can make P ✓ µ, (E 1 ) arbitrarily close to one by taking large enough. Next consider the event E n = {a n (0) > + a 1,0 /n}. Note that E n implies a n (0) > ; furthermore E 1 and E n are independent events. On the other hand, conditioned on the events E 1 and E n , we have: where in the second line we have used > a 1,0 , and in the third line we have used a n (0) > and (1 !)a n (0) > (1 !) > ✓. Hence, conditioned on E 1 \ E n , we have |a n (!) ✓| < |a n (0) ✓|, i.e., centralized networks outperform decentralized ones. We can thus bound , the probability that a social influence network with centralization ! outperforms a decentralized one (! = 0) in absolute error measure: , together with (S.7), we arrive at our main proposition: To proceed, let us denote the lower bound: Some observations are now in order regarding the behavior of the lower bound, ⌦ n (!, F ✓ µ, ). First note that ⌦ n (!, F ✓ µ, ) is decreasing in ✓ and !. Secondly, the asymptotic behavior of ⌦ n (!, F ✓ µ, ), as n ! 1, is determined by F ✓ µ, (n ) n 1 . Note that F ✓ µ, (n )  1 and F ✓ µ, ( ) ! 1 as n ! 1 for any > 0. Therefore, if F ✓ µ, (n ) ! 1 at a slow enough rate such that F ✓ µ, (n ) n 1 is bounded away from one for all n, then F ✓ µ, ( )(1 F ✓ µ, (n ) n 1 ) is bounded away from zero as n ! 1. Subsequently, for various classes of distributions with slow enough tail decay, we can establish nontrivial lower bounds ⌦ n (!, F ✓ µ, ) > 0. In subsections S2.1.1 to S2.1.3, we give conditions on the estimation context, i.e., the distribution of the initial estimates (parameterized by µ, , and ✓), of well-known heavy-tailed distributions such that ⌦ n , ⌦ n ! 1 as n ! 1. In subsection S2.1.4, we discuss the general properties of heavy-tailed distributions that make them relevant to our proposition. Finally, in subsection S2.1.5 we present countervailing arguments for thin-tailed distributions.

S2.1.1 Pareto (power-law)
Pareto or power-law distributions are archetypal, heavy-tailed distributions characterized by their polynomial tail decay. Consider a Pareto distribution with location parameter ✓e µ and S8 shape parameter , defined as follows: For Pareto distributions we have: where we used the n ! 1 asymptotic equality We now consider the three distinct n ! 1 limiting behavior that arise for > 1, = 1, and < 1: • For > 1, as n ! 1 we get and letting ! 1 we conclude that for > 1, and any 0 < ✓, 0 < ! < 1, and real µ.
• Replacing = 1 in (S.11), the lower bound ⌦ n (!, F ✓ µ,1 ) can be calculated as follow: To see why, denote x = ✓e µ / . The maximum of 1 x e x +xe x occurs at x ? satisfying 1 + 2e x ? x ? e x ? = 0. The latter has a unique solution over the positive reals given by in 1 x e x + xe x gives the maximum value W 0 (e 2 ) + 1/W 0 (e 2 ) 2 realized at , then the supremum is achieved with = ✓/(1 !) at a value that is strictly less than for any .

S2.1.2 Log-Laplace
Jayles et al. [10] point out that log-Laplace provides a better fit to the empirically measured distribution of the initial estimates, compared to log-Cauchy [11], or log-normal. Here, we analyze the asymptotic behavior of the proposed lower bound as n ! 1, when the initial estimates are distributed according to a log-Laplace distribution with parameters log ✓ + µ and S10 : (S.14) For n large enough, we have log(n /✓) µ, and using the same asymptotics as in (S.12) we get: which is increasing in and goes to one as increases to 1. Hence, Finally, for = 1, we have: , if e µ ✓ 1.

S11
Optimizing gives: We summarize the above results in the following asymptotic characterization of the lower bound for log-Laplace distributions, with a phase transition at = 1: In Figure S1.B, top, we have plotted (S.15) for ! = 1/3, ✓ = 2, and n = 50. Comparing with the direct numerical simulation in Figure S1.B, bottom, shows how the bound gets tighter for large .
Here, we analyze the case where the initial estimates are distributed according to a log-normal distribution with parameters log ✓ + µ and : where is the standard normal distribution. We next apply the following control over the Gaussian tail: 2⇡(t 2 + 1) e t 2 /2 S12 to obtain: We next choose = n , with and ✓, fixed such that (log(n /✓) µ)/ n ! 1 as n ! 1.

S2.1.4 Other heavy-tailed distributions
Many empirical studies [8,11,12,13,14] point out a heavy-tailed distribution for the numerical estimates (with a few estimates that fall on a fat right tail). Following the proof of the main proposition, we pointed out that for heavy-tailed distributions whereF ✓ µ, (n ) decreases slowly, we can provide non-trivial lower bounds on ⌦ n that remain bounded away from zero, ⌦ n (!, F ✓ µ, ) > 0, even as n ! 1. In fact, ifF ✓ µ, (n ) decreases at a rate that is slower than 1/n, i.e., nF ✓ µ, (n ) ! 1, then F ✓ µ, (n ) n 1 ! 0 as n ! 1. For such slowly decaying tails, the supremum in (S.9) is achieved as ! 1, and we can guarantee that ⌦ n ⇣ ⌦ n ⇣ 1; hence, the proposed lower bound is asymptotically tight.

S14
Here, we identify a second way, in which, our proposed lower bound is tighter for heavytailed distributions. To this end, let us revisit (S.8) -a critical step in deriving the proposed lower bound: This inequality is at the heart of the so-called "catastrophe principle" [15, Chapter 3] that applies to many heavy-tailed distributions. Intuitively, this principle entails that when one observes a larger than expected average value for a collection of heavy-tailed random variables, then this observation is most likely explained by the existence of a very large sample in the collection, i.e. a "catastrophe". On the other hand, the countervailing explanation in the case of lighttailed random variables is that "most" of the samples in the collection happen to be larger than expected. Formally, the distribution F ✓ µ, of the initial estimates is said to satisfy the catastrophe principle [15, Definition 3.1], if for any n: The preceding condition is equivalent to having: The latter is the defining property for the subexponential family of distributions, which include many common classes of heavy-tailed distributions such as those considered in subsections S2.1.1 to S2.1.3. Setting t = n and letting n ! 1, we obtain that if F ✓ µ, is a member of the subexponential family, then Hence, for such distributions belonging to the subexponential family our proposed lower bound is asymptotically tight in as much as P ✓ µ, [E 1 \ E n ] ⇣ F ✓ µ, ( )(1 F ✓ µ, (n ) n 1 ), and the only way in which our lower bound may be loose is through (S.7), i.e. if ⌦ n (!, F ✓ µ, ) > P ✓ µ, [E 1 \E n ] for all > ✓/(1 !) as n ! 1.

S15
It is worth noting that many light-tailed distributions portray an opposite picture, referred to as "conspiracy principle" in [15,Definition 3.2]; formally defined as follows: = 0, for all n 2.
As an example, suppose that the initial estimates are exponentially distributed with mean ✓e µ and the following tail probability:F ✓ µ, (x) = e x/✓e µ , x > 0.
Then their sum follows an Erlang distribution, satisfying:

S2.1.5 Distributions with strong tail decay and classic accounts of the wisdom of crowds
It is instructive to investigate the behavior of the lower bound for light-tailed distributions as well. Sub-Gaussian distributions are a class of probability distribution with strong tail decay (at least as fast as a Gaussian). Suppose x is a random variable with mean µ + ✓ and cumulative distribution F ✓ µ, . Furthermore, suppose that x µ ✓ is sub-Gaussain with variance-proxy parameter , thereby, satisfying: On the other hand, we have (1 + F ✓ µ, (n ) + F ✓ µ, (n ) 2 + . . . + F ✓ µ, (n ) n 2 )  n 1, which we can combine with the above to get that ! 0, as n ! 1, for any > 0.
Therefore, there are no set of parameters µ and that lead to a non-trivial, asymptotic lower bound on ⌦ n for random variables with sub-Gaussian tails: ⌦ n (!, F ✓ µ, ) ⇣ 0, for all ✓, µ, . As an example, consider the folded Gaussian distribution, which is defined as the absolute value of a normally distributed random variable with mean ✓e µ and variance : where is the standard normal distribution. In Figure S1.D, bottom, we have plotted ⌦ n (!, F ✓ µ, ) with ! = 1/3, ✓ = 2, n = 50, and initial estimates following a folded-Gaussian distribution.
There are no range of distribution parameters, µ and , for which ⌦ n increases above 0.6. Indeed, for such light-tailed distributions, admitting finite first and second moments, we can show that the limiting expected absolute error of the collective estimate with centralization !, a n (!), is higher than the decentralized baseline, a n (0).

S17
We can repeat the same calculations for the expected mean squared errors as well: Indeed, among all convex combinations the simple average, a n (0), has the minimum variance. Since all estimators in this class, a n (!), ! 0, have the same expected value, the simple average, a n (0), is, in fact, the mean squared error (MSE) minimizer in this class. In this subsection, we point out that even though the variance/MSE of a n (!) is minimum when ! = 0, a n (!), with ! > 0 fixed, can "often" outperform a n (0), i.e., fall closer to the truth, ✓. We show this by lower bounding the probability, ⌦ n , that |a n (!) ✓| < |a n (0) ✓|. Subsequently, we identify heavy-tailedness conditions that make this event likely and ⌦ n large.
It is worth highlighting that although for µ = 0, the sample mean (simple average) is a minimum-variance, unbiased estimator (MVUE); in statistics, it is well-known that simple average is not "optimal" for closeness in many situations. For instance, in the presence of outliers or extreme values the sample median is preferred to sample mean due to its robustness properties [17]. Our calculations in Subsections S2.1.1-S2.1.4 are of a similar flavor, pointing out the superiority of a weighted average when the underlying distributions are heavy-tailed.
In subsection S2.2, we discuss direct numerical simulation of the value of ⌦ n for various distribution classes. In sub-subsection S2.2.1, we identify other comparable right-skewness conditions for making ⌦ n large, e.g., greater than 1/2, by analyzing the location of the medians of the two estimators, a n (!), ! > 0 and a n (0). In section S3, we show that the feature, ⌦, that we identify from this theory has significant explanatory power for determining whether experimentally measured collective estimation outcomes improve, after group members interact with each other. S18

S2.2 Numerical simulations
For numerical simulations, we have fixed ✓ = 2, ! = 1/3, and n = 50. The choice of ! = 1/3 is arbitrary and our conclusions remain valid for ! > 0, as verified by the robustness checks in section S4. This choice is motivated by our observation in subsection S1.1 that ! for a star network converges to 1/3 as n ! 1. This also allows us to juxtapose our simulations with common experimental setups that use the star topology as archetypal of centralized structures [7].
Note that with ! = 1/3 fixed, the dependence of ⌦ n on the network structure is removed. Therefore, ⌦ n (!, F ✓ µ, ) is entirely determined by the distribution of the initial estimates, F ✓ µ, , i.e. the estimation context. Here, we study our proposed task feature, ⌦ n , numerically for a palette of empirically relevant distributions.
For any distribution of the initial estimates, F ✓ µ, , and number of agents, n, we calculate ⌦ n using a Monte Carlo method. We sample n initial estimates and calculate the collective estimates, a n (1/3) and a n (0), using equation (S.5). If a n (1/3) is closer to the truth, ✓, than a n (0), implying that a centralized network performed better than a decentralized network, then we add to our tally of ⌦ n . We repeat this procedure N times, where N is large enough to allow for the value of ⌦ n to converge (see the simulation procedure 1). The results in Figure S1 are obtained in this manner with ✓ = 2, n = 50, and N = 10, 000 for four different distributions: Pareto (S.10), log-Laplace (S.14), log-normal (S. 16), and folded-Gaussian (S.21).

S2.2.1 The effect of the systematic bias, µ
To see the effect of the log-normal distribution parameters, µ and , in a different light, it is instructive to study the behavior of the median of the random variable a n (!). In particular, we are interested in the location of Median[a n (!)] with respect to the truth ✓, as the distribution parameter µ is varied. We do so in the limit of large group sizes, n ! 1. Note that since log-normal distributions have finite moments, the strong law of large numbers applies. Hence, as n ! 1, a n (0) converges almost surely to In particular we also have that Median[a n (0)] = exp(log ✓ + µ + 2 /2).
Finally, it is worth noting that a similar argument applies to any right-skewed and heavytailed distribution, for which the population mean exists and is greater than the population median.

S3 Empirical analysis of estimation contexts in prior work
To empirically illustrate the explanatory power of this theory, we use data from four published experiments [8,18,7,6], in which 2, 885 participants organized into 99 independent groups completed a total of 54 estimation tasks generating 15, 562 individual estimations and 687 collective estimations. Each task induces a different distribution on the initial estimates that are measured empirically. Therefore, each task constitutes an estimation context in our framework and we have a total of 54 estimation contexts. θ median of a n (ω) median of a n (0) Figure S2: Simulating the medians of a n (!) and a n (0) for three different values of the systematic bias, µ. The distribution median, marked in blue on the x-axis, lags the distribution mean, marked in red. Subsequently, the median of a n (!) is always less than the median of a n (0) for the distributions studied here. In this framing, there are three different levels of bias: panel A, when the distribution of initial estimates significantly under-estimates the truth, µ < 2 /2, then the median of a n (0) is closer to the truth, ✓, than the median of a n (!), in this case, ⌦ n < 1/2; panel B, when the distribution of initial estimates slightly under-estimates the truth, 2 /2 < µ < 0, then the truth lies between the medians of a n (!) and a n (0), and ⌦ ⇡ 0.5; panel C, when the distribution of initial estimates over-estimates the truth, µ > 0, then the median a n (!) is closer to the truth, leading to ⌦ n > 1/2. In these simulations, we have fixed ! = 1/3, ✓ = 2, n = 50, and N = 10, 000, where N is the number of samples used to simulate the median values numerically.

S3.1 Regression Analysis
Our main empirical analyses, shown in Figure 3 and Table S1, are based on a mixed effect model with a random effect to account for the nested structure of the data.
In particular, the logistic regression equation for Figure 3.C (Table S1, Model 1) is: where y ij is a binary indicator of whether the i-th group on the j-th task improved its collective estimate after social interaction, b 0 is the fixed intercept, b 1 is the fixed coefficient for our proposed feature of the estimation context, v i is the random coefficient for the i-th group, v k is the random coefficient for the k-th experiment, and ✏ ijk is a Gaussian error term.

S22
The regression equation for Figure 3.D (Table S1, Model 2) is: where y ij is the standardized (z-score) absolute error of the revised collective estimate for the i-th group in the j-th estimation context, b 0 is the fixed intercept for the regression model, b 1 is the fixed coefficient for our proposed feature of the estimation context, I i 2 {0, 1} is an indicator variable of whether social interaction has occurred or not, b 2 is the fixed coefficient for the social influence centralization, b 3 is the fixed coefficient for the interaction term between our proposed feature of the estimation context and influence centralization, v i is the random coefficient for the i-th group, and v k is the random coefficient for the k-th experiment. Finally, ✏ ijk is a Gaussian error term.

S3.2 Regression Analysis Experiment-by-Experiment
In addition to our main regression analyses outlined above, we further analyzed the empirical data with data subsets. Specifically, we re-fit the main regression models outlined in S3.1 to the data from each experiment (or study) individually. Whereas aggregating the data from four published experiments [9,10,14,16] allowed us conduct our analyses with higher statistical power across a wider range of the R parameter space, analyzing the data experiment-by-experiment allows us to generate a range of coefficients to further understand potential variability across experiment implementations (albeit with lower statistical power and narrower ranges of R; see Panels A and B in Figure 3 in the main text). In doing so, we indeed observe variation in both, Model 1, the degree to which R explains the probability that a group improves after centralized

S4 Robustness checks
Incresing the number of agents, n Increasing the centerlization, Figure S3: Robustness checks of the simulation results by varying ! and n when calculating our proposed feature of the estimation context, ⌦, for the log-normal distribution. We find that the qualitative behavior of the phase diagram is robust to these changes. Increasing n or ! leads to sharper transitions from low ⌦ to high ⌦. Table S2: Robustness checks for Model 1 (by varying the task environment metric, the independent variable R, derived from the empirical data) for the effect of the task environment on group performance after social interaction. Each datapoint is an experimental trial. Note that for the Hill estimate of the tail index, the average estimate across all possible thresholds was used as the IV, and lower estimates indicate heavier tails. The results are from a mixed effect model with a random effect for the group. We find that the nature of the results is robust across alternatives metrics.  Table S3: Robustness checks for Model 2 (by varying the task environment metric, the independent variable R, derived from the empirical data) for the marginal effect of the interaction term between the centralization of influence and task environment on group performance-in terms of standardized absolute error. Each datapoint is an experimental trial. Note that for the Hill estimate of the tail index, the average estimate across all possible thresholds was used as the IV, and lower estimates indicate heavier tails. The results are from a mixed effect model with a random effect for the group and a random effect for the experiment/study.  In each case, we plot the ratio of the loss function evaluated in a centralized structure, ! > 0, over a decentralized structure ! = 0. A ratio less than 1 indicates that the centralized network performs better than the decentralized network. The performance of the two influence structures can vary significantly as a function of the selected loss function. The choice of the loss function is typically application-dependent. For instance, if the reward for 'getting it right' is greater than the cost of being frequently wrong -as in domains where the loss and payoff are asymmetric, unbounded, or have a remote boundary [13,19] -then the decentralized influence structure is more desirable when the dispersion is high. The initial estimates in these simulations are sampled from a log-normal distribution for a fixed number of agents (n = 50) and centralization level (! = 1/3).